Dsolve - Morphological Segmentation for German Using Conditional Random Fields

نویسندگان

  • Kay-Michael Würzner
  • Bryan Jurish
چکیده

We describe Dsolve, a system for the segmentation of morphologically complex German words into their constituent morphs. Our approach treats morphological segmentation as a classification task, in which the locations and types of morph boundaries are predicted by a Conditional Random Field model trained from manually annotated data. The prediction of morph-boundary types in addition to their locations distinguishes Dsolve from similar approaches previously suggested in the literature. We show that the use of boundary types provides a (somewhat counter-intuitive) performance boost with respect to the simpler task of predicting only segment locations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Painless Semi-Supervised Morphological Segmentation using Conditional Random Fields

We discuss data-driven morphological segmentation, in which word forms are segmented into morphs, that is the surface forms of morphemes. We extend a recent segmentation approach based on conditional random fields from purely supervised to semi-supervised learning by exploiting available unsupervised segmentation techniques. We integrate the unsupervised techniques into the conditional random f...

متن کامل

Studies for Segmentation of Historical Texts: Sentences or Chunks?

We present some experiments on text segmentation for German texts aimed at developing a method of segmenting historical texts. Since such texts have no (consistent) punctuation, we use a machine learning approach to label tokens with their relative positions in text segments using Conditional Random Fields. We compare the performance of this approach on the task of segmenting of text into sente...

متن کامل

Tagging Complex Non-Verbal German Chunks with Conditional Random Fields

We report on chunk tagging methods for German that recognize complex non-verbal phrases using structural chunk tags with Conditional Random Fields (CRFs). This state-of-the-art method for sequence classification achieves 93.5% accuracy on newspaper text. For the same task, a classical trigram tagger approach based on Hidden Markov Models reaches a baseline of 88.1%. CRFs allow for a clean and p...

متن کامل

A Conditional Random Field Framework for Thai Morphological Analysis

This paper presents a framework for Thai morphological analysis based on the theoretical background of conditional random fields. We formulate morphological analysis of an unsegmented language as the sequential supervised learning problem. Given a sequence of characters, all possibilities of word/tag segmentation are generated, and then the optimal path is selected with some criterion. We exami...

متن کامل

Supervised Morphological Segmentation in a Low-Resource Learning Setting using Conditional Random Fields

We discuss data-driven morphological segmentation, in which word forms are segmented into morphs, the surface forms of morphemes. Our focus is on a lowresource learning setting, in which only a small amount of annotated word forms are available for model training, while unannotated word forms are available in abundance. The current state-of-art methods 1) exploit both the annotated and unannota...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015